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SECOND-ORDER MATRIX CONCENTRATION INEQUALITIES 


JOELA.TROPP 


Abstract. Matrix concentration inequalities give bounds for the spectral-norm deviation of a random matrix from its 
expected value. These results have a weak dimensional dependence that is sometimes, hut not always, necessary. This pa¬ 
per identifies one of the sources of the dimensional term and exploits this insight to develop sharper matrix concentration 
inequalities. In particular, this analysis delivers two refinements of the matrix Khintchine inequality that use information 
beyond the matrix variance to reduce or eliminate the dimensional dependence. 


1. Motivation 

Matrix concentration inequalities provide spectral information about a random matrix that depends smoothly 
on many independent random variables. In recent years, these results have become a dominant tool in applied 
random matrix theory. There are several reasons for the success of this approach. 

• Flexibility. Matrix concentration applies to a wide range of random matrix models. In particular, we can 
obtain bounds for the spectral norm of a sum of independent random matrices in terms of the properties 
of the summands. 

• Ease of Use. For many applications, matrix concentration tools require only a small amount of matrix 
analysis. No expertise in random matrix theory is required to invoke the results. 

• Power. For a large class of examples, including independent sums, matrix concentration bounds are prov- 
ably close to optimal. 

See the monograph [TrolS] for an overview of this theory and a comprehensive hihliography. 

The matrix concentration inequalities in the literature are suboptimal for certain examples because of a weak 
dependence on the dimension of the random matrix. Removing this dimensional term is difficult because there are 
many situations where it is necessary. The purpose of this paper is to identify one of the sources of the dimensional 
factor. Using this insight, we will develop some new matrix concentration inequalities that are qualitatively better 
than the current generation of results, although they sacrifice some of our desiderata. Ultimately, we hope that this 
line of research will lead to general tools for applied random matrix theory that are flexible, easy to use, and that 
give sharp results in most cases. 


2. The Matrix Khintchine Inequality 

To set the stage, we present and discuss the primordial matrix concentration result, the matrix Khintchine in¬ 
equality, which describes the behavior of a special random matrix model, called a matrix Gaussian series. This 
result already exhibits the key features of more sophisticated matrix concentration inequalities, and it can be used 
to derive concentration bounds for more general models. As such, the matrix Kliintchine inequality serves as a 
natural starting point for deeper investigations. 

2.1. Matrix Gaussian Series. In this work, we focus on an important class of random matrices that has a lot of 
modeling power but still supports an interesting theory. 

Definition 2.1 (Matrix Gaussian Series). Consider fixed Hermitian matrices Hi,..., Hn with common dimension d, 
and let { 71 ,..., jn] be an independent family of standard normal random variables. Construct the random matrix 

( 2 . 1 ) 


Date: 13 March 2015. Revised 21 April 2015 and 3 August 2016. 

2010 Mathematics Subject Classification. Primary: 60B20. Secondary: 60F10, 60G50, 60G42. 
Key words and phrases. Concentration inequality, moment inequality, random matrix. 
Email: jtropp@cins.caltecli.edu. Tel: 626.395.5957. 


1 



2 


I. A. TROPP 


We refer to a random matrix with this form as a matrix Gaussian series with Hermitian coefficients or, for brevity, 
an Hermitian matrix Gaussian series. 

Matrix Gaussian series enjoy a surprising amount of modeling power. It is easy to see that we can express 
any random Hermitian matrix with jointly Gaussian entries in the form (2.1). More generally, we can use matrix 
Gaussian series to analyze a sum of independent, zero-mean, random, Hermitian matrices Fi,..., F„. Indeed, for 
any norm ||| j|| on matrices, 

E|||E”=i Yt\\\<V^-E[E\ HIE”=1 Ti F,'|ll I .1 ■ (2.2) 

The process of passing from an independent sum to a conditional Gaussian series is called symmetrization. See [LT91 , 
Lem. 6.3 and Eqn. (4.8)] for details about this calculation. Furthermore, some techniques for Gaussian series can 
be adapted to study independent sums directly without the artifice of symmetrization. 

Note that our restriction to Hermitian matrices is not really a limitation. We can also analyze a rectangular ma¬ 
trix Z with jointly Gaussian entries by working with the Hermitian dilation of Z, sometimes known as the Jordan- 
Wielandt matrix. See [TrolS, Sec. 2.1.16] for more information on this approach. 

2.2. The Matrix Variance. Many matrix concentration inequalities are expressed most naturally in terms of a ma¬ 
trix extension of the variance. 

Definition 2.2 (Matrix Variance). Let V be a random Hermitian matrix. The matrix variance is the deterministic 
matrix 

VariX) - (EX)2. 

We use the convention that the power binds before the expectation. 

In particular, consider a matrix Gaussian series X JiHi. It is easy to verify that 

Var(X) = EX^ = ^",-=i ’ HiHj = E ”=1 Hf. 

We see that the matrix variance has a clean expression in terms of the coefficients of the Gaussian series, so it is 
easy to compute in practice. 

2.3. The Matrix Khintchine Inequality. The matrix Khintchine inequality is a fundamental fact about the behav¬ 
ior of matrix Gaussian series. The first version of this result was established by Lust-Piquard [LP86], and the con¬ 
stants were refined in the papers [Pis98, BucOl]. The version here is adapted from [MJC'*' 14, Sec. 7.1]. 

Proposition 2.3 (Matrix Khintchine). Consider an Hermitian matrix Gaussian series X Z”=i JiHi, as in (2.1). 
Introduce the matrix standard deviation parameter 

a^(X):= ||Var(X)i'2|| forq>i. (2.3) 

II II Cj 

Then, for each integer p>\, 

a2p[X) < < V'2^-CT2p(X). (2.4) 

The symbol |M|,j denotes the Schatten q-norm. 

The lower bound in (2.4) is simply Jensen’s inequality. Section 7 contains a short proof of the upper bound. 

The matrix Khintchine inequality also yields an estimate for the spectral norm of a matrix Gaussian series. This 
type of result is often more useful in practice. 

Corollary 2.4 (Matrix Khintchine: Spectral Norm). Consider an Hermitian matrix Gaussian series X JiHi 

with dimension d, as in (2.1). Introduce the matrix standard deviation parameter 

CT(X) := II Var(X) || ^'^ = || E”^^ Hf || . 

Then 

^■(7(X) < EIIXII < ^e(l + 21ogd)-(7(X). 

The symbol || ■ || denotes the spectral norm, also known as the £2 operator norm. 


(2.5) 
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Proof Sketch. For the upper bound, observe that 

EIIXII < < v/2^- ||Var(X)i'2||^^ < ^y2^-\\Var{X)\\^'\ 

Indeed, the spectral norm is bounded above by the Schatten 2p-norm, and we can apply Lyapunov’s inequality 
to increase the order of the moment from one to 2p. Invoke Proposition 2.3, and bound the trace in terms of the 
spectral norm again. Finally, set p - flog d ], and simplify the constants. 

For the lower bound, note that 

EIIXII ^ = -^(EllX^lIji'^ > ^ ||Var(X)||i'2. 

The first relation follows from the optimal BChintchine-Kahane inequality [L094]; the last is Jensen’s. □ 


2.4. Two Examples. The bound (2.5) shows that the matrix standard deviation controls the expected norm of a 
matrix Gaussian series up to a factor that is logarithmic in the dimension of the random matrix. One may wonder 
whether the lower branch or the upper branch of (2.5) gives the more accurate result. In fact, natural examples 
demonstrate that both extremes of behavior occur. 

For an integer d > 1, define 


-^diae •“ 


71 


72 


73 


7d 
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That is, Xdiag isa dx d diagonal matrix whose entries fy,-: 1 < / < d} are independent standard normal variables. 
Second, define 
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The symbol denotes conjugate transposition. Up to scaling, the d x d random matrix Xgoe is the Hermitian 
part of a matrix G whose entries {jij : 1 < i,j< d] are independent standard normal variables. The sequence 
{Xgoe {d)-.d- 1,2,3,...} is called the Gaussian orthogonal ensemble (GOE). 

To apply the matrix Khintchine inequality, we represent each matrix as an Hermitian Gaussian series: 

^diag = and ^goe- 

We have written Ejj for the dx d matrix with a one in the ii,j] position and zeros elsewhere. Respectively, the 
matrix variances satisfy 

Var(X(jjag) — 

The bound (2.5) delivers 

< EIIXII < 

v/2 

The relations ^ and suppress lower-order terms. In each case, the ratio between the lower and upper bound has 
order ^logd. The matrix Khintchine inequality does not provide more precise information. 

On the other hand, for these examples, detailed spectral information is available: 

E||Xgoell=2 and E||Xdiagll = ^21ogd. (2.9) 

See [Taol2, Sec. 2.3] for a proof of the result on the GOE matrix; the bound for the diagonal matrix depends on the 
familiar calculation of the expected maximum of d independent standard normal random variables. We see that 
the norm of the GOE matrix is close to the lower bound provided by (2.5), while the norm of the diagonal matrix is 
close to the upper bound. 


and Var(Xgoe) - {l + d ^) • I. 


\J 2 e\ 0 gd for X = Xdiag or X = Xgoe ■ (2.8) 
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2.5. A Question. Corollary 2.4 shows that the matrix variance controls the expected norm of a matrix Gaussian 
series. On the other hand, the two examples in the previous section demonstrate that we need more information 
than the variance to determine the norm up to a constant factor. Therefore, we must ask... 

Are there parameters that allow us to calculate the norm of a matrix Gaussian series more 
precisely than the matrix variance? 

This paper provides the first affirmative answer to this question. 


3. Beyond the Matrix Khintchine Inequality 

This section presents new results that improve on the matrix Khintchine inequality. Proposition 2.3. First, we 
motivate the type of parameters that arise when we try to refine this result. Then we define a quantity, called the 
matrix alignment parameter, that describes how the coefficients in the matrix Gaussian series interact with each 
other. In Section 3.3, we use the alignment parameter to state a new bound that provides a uniform improvement 
over the matrix Khintchine inequality. Further refinements are possible if we consider random matrices with highly 
symmetric distributions, so we introduce the class of strongly isotropic random matrices in Section 3.5. Section 3.6 
contains a matrix Khintchine inequality for matrix Gaussian series that are strongly isotropic. This bound is good 
enough to compute the norm of a large GOE matrix exactly. Finally, in Sections 3.7 and 3.8, we discuss extensions 
and related work. 


3.1. Prospects. What kind of parameters might allow us to refine Proposition 2.3? The result is already an identity 
for p-l. For inspiration, let us work out what happens when p-2: 

eiixi|4 = 

= 2tr(E ■ Hff + tr(E ■ HiHjHiHj] 2trVai{Xf + trA. 

We use the convention that powers bind before the trace. The product of Gaussian variables has expectation zero 
unless the indices are paired. In the last expression, the first term comes from the cases where i = j and k - ( ox 
where i - ( and j = k; the second term comes from the case where i = k and j = £. Once again, the matrix vari¬ 
ance Var(X) emerges, but we have a new second-order term A that arises from the summands where the indices 
alternate: {i,j,i,j]} 

In a sense, the matrix A reflects the extent to which the coefficient matrices are aligned. When the family {Hi] 
commutes, the matrix A = Var(X)^, so the second-order term provides no new information. More generally, when¬ 
ever the coefficients commute, the quantity (E||X|| 2 p^^^^^* can be expressed in terms of the matrix variance and 
the number p, and the matrix Khintchine inequality. Proposition 2.3 , gives an estimate of the correct order. In other 
words, commuting coefficients are the worst possible circumstance. Most previous work on matrix concentration 
implicitly uses this worst-case model in the analysis. 

To achieve better results, we need to account for how the coefficient matrices Hi interact with each other. The 
calculation above suggests that the matrix A might contain the information we need. Heuristically, when the 
coefficients fail to commute, the matrix A should be small. As we will see, this idea is fruitful, but we need a 
parameter more discerning than A. 

Let us summarize this discussion in the following observation: 

To improve on the matrix Khintchine inequality, we must quantify the extent to which the 
coefficient matrices commute. 

Our work builds on this intuition to establish new matrix concentration inequalities. 


3.2. The Matrix Alignment Parameter. In this section, we introduce a new parameter for a matrix Gaussian series 
that describes how much the coefficients commute with each other. In later sections, we will present extensions of 
the matrix Khintchine inequality that rely on this parameter. 


Definition 3.1 (Matrix Alignment Parameter). Let Hi,..., H„ be Hermitian matrices with dimension d. For each 
p > 1, the matrix alignment parameter of this sequence is the quantity 

\Zlj^,HiQiHjQ2HiQ,Hjf" 


w 


p 


: max 
Qe 


and 


w := Wr, 


(3.1) 


related observation animates the theory of free probability which gives a fine description of certain large random matrices. The key fact 
about centered, free random variables Y and Z is that crossing moments, such asq}[Y ZYZ), must vanish [NS06]. 
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The maximum takes place over a triple (Qi,Q 2 ,Q 3 ) of unitary matrices with dimension d. The matrix absolute 
value is defined as\B\[B* 

Roughly, the matrix alignment parameter (3.1) describes how well the matrices Hi,..., Hn can be aligned with each 
other under the worst choices of coordinates. 

The quantity (3.1) appears mysterious, so it is worth a few paragraphs to clarify its meaning. First, let us compare 
the alignment parameter with the matrix standard deviation parameter (2.3) that appears in the matrix Khintchine 
inequality. 

Proposition 3.2 (Standard Deviation versus Alignment). Let Hi,..., Hn be Hermitian matrices. Define the standard 
deviation and alignment parameters 

(Tp := I j Hi] I and Wp rnax 

Then 

Wp < CTp for all p > 4. 

The proof of Proposition 3.2 appears in Section 8.9. 

Next, let us return to the examples in the introduction. In Section 4, we provide detailed calculations of the 
standard deviation and alignment parameters. For the diagonal Gaussian series Xdiag defined in (2.6), we have 

o-(Xdiag) = 1 and u;(Xdiag) = 1. 

For the GOE matrix Xgoe defined in (2.7), 

cr(Xgoe) = 1 + d~^ and w[Xgog) < {Ad)~^^^. (3.2) 

The matrix alignment parameter can teU the two examples apart, while the matrix standard deviation cannot! 

Remark 3.3 (Notation for Alignment). Here and elsewhere, we abuse notation by writing Wp (X) and if (X) for the 
alignment parameter of a matrix Gaussian series X Hl^i YiHj, even though w is a function of the coefficient 
matrices Hi in the representation of the series. 

Remark 3.4 (Are the Unitaries Necessary?). At this stage, it may seem capricious to include the unitary matrices in 
the definition (3.1). In fact, the example in Section 4.3 demonstrates that the alignment parameter would lose its 
value if we were to remove the unitary matrices. On the other hand, there are situations where the unitary matrices 
are not completely arbitrary, as discussed in Section 8.1. 

3.3. A Second-Order Matrix Khintchine Inequality. The first major result of this paper is an improvement on 
the matrix Khintchine inequality. This theorem uses the second-order information in the alignment parameter to 
obtain better bounds. 


\j:lj=iHiQiHjQ2HiQsHj 


1/4 


for p > 1. 


Theorem 3.5 (Second-Order Matrix Khintchine). Consider an Hermitian matrix Gaussian series XJiHi, as 
in (2.1). Define the matrix standard deviation and matrix alignment parameters 

rr2'\t/2|| I-—' 11/4 

f7p(A):= 


and Wa{X] max 
Qe 


\Y!lj=iHiQiHjQ2HiQ2Hj\ 


forq>l. 


The maximum takes place over a triple (Qi, Q 2 , Q 3 ) of unitary matrices. Then, for each integer p > 3, 

< 3^/2^-(j2p{X] + .j2^-W2p(X). 


(3.3) 


The symbol || ■ || (^ denotes the Schatten q-norm. 

The proof of Theorem 3.5 appears in Section 8. 

We can also derive bounds for the spectral norm of a matrix Gaussian series. 

Corollary 3.6 (Second-Order Matrix Khintchine: Spectral Norm). Consider an Hermitian matrix Gaussian series 
X := T.’i^iJiHi with dimension d > 8, as in (2.1). Define the matrix standard deviation and matrix alignment 
parameters 

(TlX) := and w(.X) := max \\Zli=,HiQiHjQ2HtQsHj 

Qf II ’J 

The maximum ranges over a triple (Qi, Q 2 , Q 3 ) of unitary matrices. Then 

E||X|| < 3(j{X) ^2e\ogd + w[X) ^2elogd. 


(3.4) 
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The symbol || • || denotes the spectral norm. 

The result follows from Theorem 3.5 by setting p - \\ogd'\. The potential gain in (3.4) over (2.5) comes from the 
reduction of the power on the first logarithm from one-half to one-quarter. 

3.4. Matrix Khintchine versus Second-Order Matrix Khintchine. Let us make some comparisons between Propo¬ 
sition 2.3 and Theorem 3.5. First, recall that the alignment parameter is dominated by the standard deviation 
parameter; WqiX) < (Jq{X) for q > 4 because of Proposition 3.2. Therefore, the bound (3.3) implies that 

(EIIXIIg) ^< (3 + ^/2^) ■(72p{X) for p = 3,4,5,.... 

This is very close to the prediction from Proposition 2.3, so Theorem 3.5 is never significantly worse. 

On the other hand, there are situations where Theorem 3.5 gives qualitatively better results. In particular, for 
the GOE matrix Xgoe(d), the bound (3.3) and the calculation (3.2) yield 

E||Xgoe(d)ll < 3Y/2elogd-H for d > 8. 

* vid 

This estimate beats our first attempt in (2.8), but it stiU falls short of the correct estimate E ||Xgoe II ~ 2. 

3.5. Strongly Isotropic Random Matrices. As we have seen. Theorem 3.5 offers a qualitative improvement over 
the matrix Khintchine inequality. Proposition 2.3. Nevertheless, the new result still lacks the power to determine 
the norm of the GOE matrix correctly. We can obtain more satisfactory results by specializing our attention to a 
class of random matrices with highly symmetric distributions. 

Definition 3.7 (Strong Isotropy). Let X be a random Hermitian matrix. We say that X is strongly isotropic when 

EXP = (EfrXP)-I forp = 0,l,2,.... 

The symbol tr denotes the normalized trace: tr A d~^ tr A when A has dimension d. 

The easiest way to check that a random matrix is strongly isotropic is to exploit symmetry properties of the 
distribution. We offer one of many possible results in this direction [CT14, Lem. 7.1]. 

Proposition 3.8 (Strong Isotropy: Sufficient Condition). Let X be a random Hermitian matrix. Suppose that the 
distribution ofX is invariant under signed permutation: 

X ~ n*Xn for every signed permutation 11. 

Then X is strongly isotropic. The symbol ~ refers to equality of distribution. A signed permutation is a square matrix 
that has precisely one nonzero entry in each row or column, this entry taking the values +1. 

Proof. Suppose that n is a signed permutation, drawn uniformly at random. For p = 0,1,2,3,..., 

EXP = E[(n*Xn)P] =E[E[n*XPn|X]] =E[(frXP)-l] = (EfrXP)-I 

The first relation uses invariance under signed permutation, and the second relies on the fact that signed permuta¬ 
tions are unitary. Averaging a fixed matrix over signed permutations yields the identity times the normalized trace 
of the matrix. □ 

Proposition 3.8 applies to many types of random matrices. In particular, the diagonal Gaussian matrix X^nag and 
the GOE matrix Xgoe are both strongly isotropic because of this result. Other types of distributional symmetry can 
also lead to strong isotropy. 

Remark 3.9 (Group Orbits). Here is a more general class of matrix Gaussian series where we can verify strong 
isotropy using abstract arguments. Let be a unitary representation of a finite group, and let A be a fixed Hermit¬ 
ian matrix with the same dimension. Consider the random Hermitian matrix 

X:= ^ ju UAU* 

where {yj/: H e (#} is an independent family of standard normal variables. Since acts on itself by permutation, 

UXU*~X for each 


This observation allows us to perform averaging arguments like the one in Proposition 3.8. 
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There are several ways to apply this property to argue that X is strongly isotropic. For example, it suffices that 

:= {M : MU = UM for all 1/ e ^ = {zl: z £ C}. 

It is also sufficient that {Ua : U forms a (complete) tight frame for every vector a; see the paper [VW08] for 
some situations where this condition holds. 

Remark 3.10 (Spherical Designs). A spherical t-design is a collection {m; : / = 1,..., AT} of points on the unit sphere 
in with the property that 

f I ^ 

I (p{u)dplu] =— y(p[Ui) 

Jsd-1 N 

where (p is an arbitrary algebraic polynomial in d variables with degree t and dp is the Haar measure on the sphere 
See the paper [BRV13] for existence results and background references. 

Given a spherical t-design, consider the random matrix 

N 

yjiUiU*. 

i=l 

where { 7 /: i = 1,..., At} is an independent family of standard normal variables. By construction, this random matrix 
has the property that 

EXP = (EtrXP) - I for p = 0,1,2,..., [f/2J. 

This variant of the strong isotropy property is sufficient for many purposes, provided that t =; logd. 

3.6. A Second-Order Khintchine Inequality under Strong Isotropy. The second major result of this paper is a 
second-order matrix iihintchine inequality that is valid for matrix Gaussian series with the strong isotropy property. 
Like Theorem 3.5, this result uses the alignment parameter to control the norm of the random matrix. 

Theorem 3.11 (Second-Order Khintchine under Strong Isotropy). Consider an Hermitian matrix Gaussian series 
x-.^LUnHt with dimension d, as in (2.1), and assume thatX is strongly isotropic. Introduce the matrix standard 
deviation and matrix alignment parameters: 

1 /? II II ^^^ 

w(X):^max\\y!.^HiQiHjQ2HiQ3Hj\\ . 

Qe II II 

The maximum ranges over a triple (Qi, Q 2 , Q 3 ) of unitary matrices. Then, for each integer p>l, 

< [2a[X) + 2^‘^p^‘'^w{X)]-d^''-^P\ 

The symbol || • || refers to the spectral norm, while || ■ || (^ is the Schatten q-norm. 

The proof of this result appears in Section 9, where we also establish a lower bound. 

Theorem 3.11 shows that the moments of the random matrix X are controlled by the standard deviation a(X) 
whenever p^^^w{X) « a{X). If we take p - flogd], the Schatten 2p-norm is essentially the same as the spectral 
norm, and the dimensional factor on the right-hand side is negligible. Therefore, 

w{X) log^^^ d « a(X) implies E ||X|| ^ 2a(X]. 

In the presence of strong isotropy, the spectral norm of a matrix Gaussian series is comparable with the standard 
deviation cr(X) whenever the alignment parameter w[X) is relatively small! 

In particular, we can apply this result to the GOE matrix Xgoe because of Proposition 3.8. The calculation (3.2) 
of the standard deviation and alignment parameters ensures that E || JEgoe II ~ 2. As we observed in (2.9), this bound 
is sharp. For this example, we can even take p =1 d^^^, which leads to very good probability bounds via Markov’s 
inequality. Furthermore, a more detailed version of Theorem 3.11, appearing in Section 9, is precise enough to 
show that the semicircle law is the limiting spectral distribution of the GOE. 

On the other hand, the dependence on the exponent p in Theorem 3.11 is suboptimal. This point is evident 
when we consider the diagonal Gaussian matrix Xdiagfd). Indeed, Theorem 3.11 only implies the bound 

EllXdiagll < const ■log^^'^d. 

As we observed in (2.9), the power on the logarithm should be one-half. 
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3.7. Discussion. This paper opens a new chapter in the theory of matrix concentration and noncommutative mo¬ 
ment inequalities. Our main technical contribution is to demonstrate that the matrix Khintchine inequality, Propo¬ 
sition 2.3, is not the last word on the behavior of a matrix Gaussian series. Indeed, we have shown that the matrix 
variance does not contain sufficient information to determine the expected norm of a matrix Gaussian series. We 
have also identified another quantity, the matrix alignment parameter, that allows us to obtain better bounds for 
every matrix Gaussian series. Furthermore, in the presence of more extensive distributional information, it is even 
possible to obtain numerically sharp bounds for the norm of certain matrix Gaussian series. 

There are a number of ways to extend the ideas and results in this paper: 

Higher-Order Alignment: If we consider alignment parameters involving 2k coefficient matrices, it is possible to 
improve the term p^'^(J 2 p in Theorem 3.5 to See Section 8.1 for some additional details. 

Other Matrix Series: We can use exchangeable pairs techniques [MJC"^ 14] to study matrix series of the form X := 
iiHi where is an independent family of scalar random variables. This approach is potentially 
quite interesting when the f; are Bernoulli (that is, 0-1) random variables. 

Independent Sums: We can use conditioning and symmetrization, as in (2.2), to apply Theorem 3.5 to a sum of 
independent random matrices. See [CGT12, App.j for an example of this type of argument. 

Rectangular Matrices: The techniques here also give results for rectangular random matrices by way of the Her- 
mitian dilation [Trol5, Sec. 2.1.16]. In this setting, a different notion of strong isotropy becomes relevant; 
see Section 9.1. 

We have not elaborated on these ideas because there is also evidence that alignment parameters will not lead to 
final results on matrix concentration. 

3.8. Related Work. There are very few techniques in the literature on random matrices that satisfy all three of 
our three requirements: flexibility, ease of use, and power. In particular, for many practical applications, it is 
important to be able to work with an arbitrary sum of independent random matrices. We have chosen to study 
matrix Gaussian series because they are the simplest instance of this model, and they may lead to further insights 
about the general problem. 

Most classical work in random matrix theory concerns very special classes of random matrices; the books [BSIO, 
Taol2] provide an overview of some of the main lines of research in this field. There are some specific subareas 
of random matrix theory that address more general models. The monograph [NS06] gives an introduction to free 
probability. The book chapter [Verl2] describes a collection of methods from Banach space geometry. The mono¬ 
graph [Trol5] covers the theory of matrix concentration. The last three works have a wide scope of applicability, 
but none of them provides the ultimate description of the behavior of a sum of independent random matrices. 

There is one specific strand of research that we would like to draw out because it is very close in spirit to this 
paper. Recently, Bandeira & van Handel [BV14] and van Handel [vH15] have studied the behavior of a real sym¬ 
metric Gaussian matrix whose entries are independent and centered but have inhomogeneous variances (the 
independent-entry model). Ad x d random matrix from this class can be written as 

-^indep •— ' (E/y -t Ey/) for E K. 

As usual, {fij] is an independent family of standard normal random variables, and we assume that ajj - ajj with¬ 
out loss of generality. 

To situate this model in the context of our work, observe that matrix Gaussian series are significantly more 
general than the independent-entry model. The strongly isotropic model is incomparable with the independent- 
entry model. To see why, recall that strongly isotropic matrices can have dependent entries. At the same time, 
EX^dep diagonal for each integer p > 0, but it need not be a scalar matrix. 

For the independent-entry model, Bandeira & van Handel [BV14] established the following (sharp) bound: 

EllJ^indepll ^2CT(Xindep) + const-max,y |a,y|-yTogd (3.5) 

The maximum entry max/y | atj | plays the same role in this formula as the matrix alignment parameter plays in this 
paper. The paper [BV14] leans heavily on the independence assumption, so it is not clear whether the ideas extend 
to a more general setting. 
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To compare the result (3.5) with the work here, we can compute the matrix alignment parameter for the independent- 
entry model using a difficult extension of the calculation in Section 4.2. This effort yields 

i"(^indep) = (maxj y I aij |^| . 

We see that the matrix alignment parameter is somewhat larger than the maximum entry max, j \ aij\. Thus, for the 
independent model, Theorem 3.5 gives us a better result than the classical Khintchine inequality. Proposition 2.3, 
but it is somewhat weaker than (3.5). Theorem 3.11 would give a result close to the bound (3.5), but it does not 
always apply because the independent-entry model need not be strongly isotropic. 

The independent-entry model is not adequate to reach results with the same power and scope as the current 
generation of matrix concentration bounds [Trol5]. Nevertheless, the estimate (3.5) strongly suggests that there 
are better ways of summarizing the interactions of the coefficients in an Hermitian matrix Gaussian series X 
YiHj than the alignment parameter w[X). One possibility is the weak variance parameter: 

(T*(X):= sup 

Ilull = llvll=l 

For the independent-entry model, this quantity reduces to const - max/j The idea of considering ct*(X) is 
motivated by the discussion in [Trol2, Sec. 4], as well as the work in [BV14, vH15]. Unfortunately, at this stage, 
it is not clear whether there are any parameters that allow us to obtain a simple description of the behavior of a 
Gaussian matrix In the absence of burdensome independence or isotropy assumptions. This is a frontier for future 
work. 


4. Computation of the Matrix Alignment Parameters 

In this section, we show how to compute the matrix alignment parameter for the two random matrices in the 
introduction, the diagonal Gaussian matrix and the GOE matrix. Afterward, we show by example that neither 
Theorem 3.5 nor Theorem 3.11 can hold if we remove the unitary factors from the matrix alignment parameter. 

4.1. A Diagonal Gaussian Matrix. The diagonal Gaussian matrix takes the form 

-^diag := 

The matrix variance Var(Xdiag) = '^^diag “ follows that the matrix standard deviation parameters, defined 
in (2.3), satisfy 

CTp(^diag) = ||Var(Xdiag)^^^||p = for 1 < p < oo. 

We will show that the matrix alignment parameters, defined in (3.1), satisfy 

M^p(^diag) = for4< p < oo. 

Thus, for this example, the second-order matrix Khintchine inequalities. Theorem 3.5 and Theorem 3.11, do not 
improve over the matrix Khintchine inequality. Proposition 2.3. This outcome is natural, given that the classical 
result is essentially optimal in this case. 

Let us evaluate the matrix alignment parameter. For a triple (Q, S, U) of unitary matrices, form the sum 
W(Q, S, D) Ztj=i Eo -QEjjSEu UEjj = Y.tj=i “O'' Ei; = Q © S" © U. 

We have written 0 for the Schur (i.e., componentwise) product, and ^ is the transpose operation. When Q = S = 
{/ = I, the sum collapses: W (I, I, I) = I. Therefore, 

M^p(^diag) = max |||W(Q,S,D)|i'4||p>||I||p = di'P forp>l. 

But Proposition 3.2 shows that 

< (7p(Xdiag) = for each p > 4. 

Therefore, Wp (Xdiag) = ftp (Xdiag) = d^^P for p > 4. The result for p = oo follows when we take limits. 

Remark 4.1 (Commutativity). A similar calculation is valid whenever the family {Hi} of coefficient matrices in the 
matrix Gaussian series (2.1) commutes. 
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4.2. A GOE Matrix. The GOE matrix takes the form 

^goe := '^1,1=1 'T(^*7 + 

An easy calculation shows that the matrix variance satisfies 

Var(Agoe) = E^ Xt-=i ^*^‘7 + Eji)' = (1 + d"!) ■ I. 

Therefore, the matrix standard deviation parameters, defined in (2.3), equal 

crp(Xgoe)= ||Var(Agoe)^^^||p = Vl + d-^-d^'P forl<p<oo. 

We will demonstrate that the matrix alignment parameters, defined in (3.1), satisfy 

WpiXgoe) ^ {d~^ + ■ d^^^ for 4 < p < oo. 

When d is large, the matrix alignment parameters are much smaller than the matrix standard deviation parameters. 
As a consequence, the second-order matrix Khintchine inequalities deliver a substantial gain over the classical 
matrix Khintchine inequality. 

Let us compute the matrix alignment parameter. For a triple (Q, S, U) of unitary matrices, introduce the (unnor¬ 
malized) sum 

W(Q,S, U] := + ^hhm^hh + E;2h)S(E,-,fe + Efe;,)If(E;,,-, + E,-,;,). 

It is not hard to evaluate this sum if we take care. First, distribute terms: 

W(Q,S, U] = Eii,/2,;i,j2=i ^hh ^hh ‘Ikji^hh ^hjz + IhJz^Jih ^hjz + ‘lizjz^hk ^hjz]'^hh 

+ {^izji^jzh ^hji ^jzk ^hji ‘Ikjz^jih ^kji ^hjz ^jih ^hji) ' ^hjz 

+ {^hji^jzh ^kjz ^hji ^jzh ^hh ‘^hjz^jih ^kjz ‘Ihjz ^jik ^hjz) ' ^hji 

+ {^iiji^jzh ^hji ^hji ^Jzk ^hji ^hjz^jih ^hii ^hjz ^jik ^<U'i) ' ^hh \ ■ 

In each line, we can sum through the two free indices to identify four matrix products. For example, in the first line, 

we can sum on 12 and 72 - This step yields 

W[Q,S,U)^Y.i.n=i [s'^U^Q+USQ + triQ'^U) 

+ Yfh,jz=i ■ S' + + S't/'Q + 

+ lfiz,h=i 
+ Tfiz.jz=i 

Sum through the remaining indices to reach 

W{Q, S, U) = (S^t/^Q + USQ + tr(Q^f/) + t/Q^S^) 

+ (trlQ'^If) + UQ^S^ + S'^U^Q+USQ] 

+ [USQ +S^U^Q+ UQ^S^ + tr(Qlf) S^) 

+ [UQ^S^ + tr(Qt/) + USQ + S^U^Q). 

Twelve of the sixteen terms are unitary matrices, and the remaining four are scaled unitary matrices. Furthermore, 
each trace is bounded in magnitude by d, the worst case being Q = U -1. Applying the definition of the Schatten 
norm, the triangle inequality, and unitary invariance, we find that 

III W(Q, S, If) 1II p = II W{Q, S, U) II < ((4d + 12) ■ II HI puf'^ = (4d + 12) for p > 4. 

To compute WpiXgoe), we must reintroduce the scaling (2d)“^^^, which gives the advertised result: 

WplXgoe) < (2d)“^^^ • (4d-I-12)^^^ ■ = [d~^ + 3d“^)'^^• d^^P. 

To obtain the bound for p- 00 , we simply take limits. 
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4.3. The Unitaries are Necessary. Suppose that X := is an Hermitian matrix Gaussian series with di¬ 

mension d, and let cr(X) he the matrix standard deviation (2.3). Consider the alternative alignment parameter 

This quantity is suggested by the discussion in Section 3.1. Consider a general estimate of the form 


EIIXII < f{d)-a{X) + g{d)-6{X). (4.1) 

We will demonstrate that, for every choice of the function g, there is a lower bound f{d) > const ■ \J\ogd. From 
this claim, we deduce that it is impossible to improve over the classical BChintchine inequality by using the second- 
order quantity d(X). Therefore, the unitary matrices in the alignment parameter w(X) play a critical role. Most of 
this argument was developed by Afonso Bandeira; we are grateful to him for allowing us to include it. 

Introduce the Pauli spin matrices 


Hi:= 


1 

0 


0 

-1 


H2:= 


0 1 
1 0 



i 

0 


These matrices are Hermitian and unitary, so H? = I for i - 1,2,3. Furthermore, they satisfy the relations - 

-I when i ^ j. Next, define Hq := i/al, where a 2\/3 - 3. Calculate that 


HiHjHiHj = Xto Hf + E5=i HoHjHoHj + ELi HiHoHiHo + EE=i HiHjHiHj 




= (a^-t3)I-t6Q:I-6I= (a2-H6o:-3)I = 0. 


Indeed, a is a positive root of the quadratic. 

Consider the two-dimensional Gaussian series F generated by the matrices 

Y-'-LloTiHi. 

As usual, {ji) is an independent family of standard normal variables. For the series Y, we have already shown that 
the alternative alignment parameter d(F) = 0. Let us compute the variance and standard deviation: 

Var(F) = E;=o^?= ^“ + 3)I = 2\/3I and (7(F) = ||Var(F)||i'2 = 12^'^. 

Expanding the random matrix F in coordinates, we also find that 


F = 


\/aTo + Ti 

72 - m 


72 +173 
s/ayo-Ji 


Therefore, the top-left entry (F)ii is a centered normal random variable with variance 1 - 1 - o: = 2(\/3 - 1). 

To obtain the counterexample to the bound (4.1), fix an integer d > 1. Let Fi,..., F^ be independent copies of 
the two-dimensional Gaussian series F, and construct the 2(i-dimensional matrix Gaussian series 

Xspin Fi ® ■ ■ ■ e Frf = Eti Ej; ® Yj ~ E^i Elo Tij ® Ht). 

We have written ® for direct sum and iSi for the Kronecker product; the matrices Ejj are the diagonal units with 
dimension d^ d) and { 7 ,-^} is an independent family of standard normal variables. 

Extending the calculations above, we find that (T(Xspin) = 12^^^ and 5(Xspin) = 0. Meanwhile, the norm of Xspm 
is bounded below by the absolute value of each of its diagonal entries. In particular, 

ElIXspinll > Emaxy |(F,)iil > const- (2(\/3- 1 ))^^^ ■ \J\ogd. 

We have used the fact that the expected maximum of d independent standard normal variables is proportional to 
^ylogd. Assuming that (4.1) is valid, we can sequence these estimates to obtain 

const ■ yi^</((f)-(T(Xspin) + g(d)-d(Xspin) = 12l'4-/((i). 

Therefore, the function f[d) must grow at least as fast as \/logd. We conclude that a bound of the form (4.1) can 
never improve over the classical matrix Khintchine inequality. 
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5. Notation & Background 

Before we enter into the body of the paper, let us set some additional notation and state a few background 
results. First, denotes the complex linear space ofdxd matrices with complex entries. We write for the real- 
linear subspace of y^ that consists of Hermitian matrices. The symbol * represents conjugate transposition. We 
write 0 for the zero matrix and I for the identity. The matrix Etj has a one in the {i,j) position and zeros elsewhere. 
The dimensions of these matrices are typically determined by context. 

For an Flermitian matrix A, we define the integer powers for p = 0,1,2,3,... in the usual way by iterated mul¬ 
tiplication. For a positive-semidefinite matrix P, we can also define complex powers by raising each eigenvalue 
of P to the power z while maintaining the eigenvectors. In particular, P^^^ is the unique positive-semidefinite 
square root of P. The matrix absolute value is defined for a general matrix B by the rule |B| := {B* Note that 
|P| = P when P is positive semidefinite. 

The trace and normalized trace of a matrix are given by 

trB h// and ^tB for Be y^- 

We use the convention that a power binds before the trace to avoid unnecessary parentheses; powers also bind 
before expectation. The Schatten p-norm is defined for an arbitrary matrix B via the rule 

||B||p:=(tr|B|P)^'P forp>l. 

The Schatten oo-norm H-lloo coincides with the spectral norm IHI. This work uses both trace powers and Schatten 
norms, depending on which one is conceptually clearer. We require some Fldlder inequalities involving the trace 
and the Schatten norms. For matrices A, Be y^; and p > 1, 

|tr(AB)| < (tr|A|‘^)^'^-(tr|B|PY^^' where p':= p/(p-1). (5.1) 

Furthermore, 

\\A*B\\l<\\A*A\\g-\\B*B\\g. (5.2) 

These results are drawn from [Bha97, Chap. IV]. 

6. The Trace Moments of a Matrix Gaussian Series 

For each major result in this paper, the starting point is a formula for the trace moments of a matrix Gaussian 
series. 

Lemma 6.1 (Trace Moment Identity). LetX := JiHi be an Hermitian matrix Gaussian series, as in (2.1). For 
each integer p>l, we have the identity 

EtrX^P = (6.1) 

The easy proof of Lemma 6.1 appears in the next two subsections. 

Integration by parts is not foreign in the study of Gaussian random matrices; for example, see [AGZIO, Sec. 2.4.1] 
or [Keml3, Sec. 9]. The exchangeable pairs method for establishing matrix concentration is also based on an 
elementary, but conceptually challenging, analog of integration by parts [MJC'''14, Lem. 2.4]. Aside from these 
works, we are not aware of any application of related techniques to prove results on matrix concentration. 

6.1. Preliminaries. To obtain Lemma 6.1 , the main auxiliary tool is the classical integration by parts formula for a 
function of a standard normal vector [NP12, Lem. 1.1.1]. In the form required here, the result can be derived with 
basic calculus. 

Fact 6.2 (Gaussian Integration by Parts). Let y e IR” be a vector with independent standard normal entries, and let 
/: K” —► K he a function whose derivative is absolutely integrable with respect to the standard normal measure. Then 

E”=iE[rr/(r)] = ELiE[foi/)(r)]- 


The symbol d; denotes differentiation with respect to the i th coordinate. 


We also use a well-known formula for the derivative of a matrix power [Bha97, Sec. X.4]. 
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Fact 6.3 (Derivative of a Matrix Power). Let A: U IVD^ be a differentiable function. For each integer g > 1, 

^ {A[u)e) = Y.f:lA[u)e ■ La[u) ■ A{u)^-^-K (6.2) 

In particular, 

— trA(M)P = p-tr Aiu)^~^ ■ -La{u) . 
du [ du 

The symbol ■ refers to ordinary matrix multiplication. 

6.2. Proof of Lemma 6.1. Let us treat the random matrix X as a matrix-valued function of the standard normal 
vector 7 : = (71 ,..., 7n). That is, 

Write X - X- X^P~^ and distribute the sum in the first factor: 

EtrX^P = Etr [£;Li 7;^;) = Eli E [ji ' tr ] 

The Gaussian integration hy parts formula, Fact 6.2, implies that 

EtrX^P = ^;*^jEtr[Hrd,-(x 2 P-i)]. 

Since diX - Hi, the derivative formula (6.2) yields 

EtrX^P = tr [Hi • EX7HiX^''~^-‘^] = E”=i tr [HiX‘^HiX^P-^-'^]. 

This completes the proof of the formula (6.1). 

7. A Short Proof of the Matrix Khintchine Inequality 

Historically, proofs of the matrix Khintchine inequality have heen rather complicated, but the result is actually 
an immediate consequence of Lemma 6.1. We will present this argument in detail because it has not appeared in 
the literature. Furthermore, the approach serves as a template for the more sophisticated theorems that are the 
main contributions of this paper. Let us restate Proposition 2.3 in the form that we will establish it. 

Proposition 7.1 (MatrixKhintchine). LetX := JiHi be an Hermitian matrix Gaussian series, as in (2.1). Define 
the matrix variance and standard deviation parameters 

P:=Var(X) = EL^f and CF 2 q ■-foreachq>l. (7.1) 

Then, for each integer p > 1, 

(EtrX2P)^''^P’ < 


The short proof of Proposition 7.1 appears in the next two sections. The approach parallels the exchangeable pairs 
method that has been used to establish the matrix Khintchine inequality for Rademacher series [MIC’*' 14, Cor. 7.3]. 
Here, we replace exchangeable pairs with the conceptually simpler argument based on Gaussian integration by 
parts. To reach the statement of Proposition 2.3, we simply rewrite the trace in terms of a Schatten norm. 

Remark 7.2 (Noninteger Moments). Our proof of Proposition 7.1 can be adapted to obtain moment bounds for aU 
p > 2. See [MJC^ 14, Cor. 7.3] for a closely related argument. 

7.1. Preliminaries. The main idea in the proof is to simplify the trace moment identity (6.1) with an elementary 
matrix inequality. Anticipating subsequent arguments, we state the inequality in greater generality than we need 
right now. 

Proposition 7.3. Suppose that H and A are Hermitian matrices of the same size. Let q and r he integers that satisfy 
0< q < r. For each real number s in the range 0< s< min{( 7 , r - q}, 

tr[HA'?HA''“4] <tr[H|A|*H|Ar“*]. 

The proof of Proposition 7.3 depends on a numerical fact. For nonnegative numbers a and f, the function 
9 ^ + a}~^ is convex on the interval [0,1], and it achieves its minimum at 6 = \. Therefore, 

-F < a^' when 0 < 0' < min{0,1 - 0}. 

We need to lift this scalar inequality to matrices. 


(7.2) 
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Proof. Without loss of generality, we may change coordinates so that A is diagonal: A - cn^a- Expanding both 
copies of A, 


r-Q a 

+ a 




After we take absolute values, the inequality (7.2) implies that 

a!+ a~'^a^. < |a,|^ lay+ laj-T”*kjl*- 

The remaining trace is nonnegative: tr [HE;,- HEjj ] = 1 hij p, where hij are the components of the matrix H. As a 
consequence, 


To reach the last identity, we reversed our steps to reassemble the sum into a trace. 


□ 


7.2. Proof of the Matrix Khintchine Inequality. We may now establish Proposition 7.1. Let us introduce notation 
for the quantity of interest: 

E^P -.^EtrX^P. 

Use the integration by parts result, Lemma 6.1, to rewrite the trace moment: 

= YX7 ELi ^tT[HiX'^HiX^p-^-^. 

For each choice of q, apply the matrix inequality from Proposition 7.3 with r -2p-2 and 5 = 0 to reach 

E^P < (2p - l)Y,yEtr[Hfx^''P~^''] = [2p-l)-Etr[VX^''P~^''] 

We have identified the matrix variance V defined in (7.1). 

Next, let us identify a copy of E on the right-hand side and solve the resulting algebraic inequality. To that end, 
invoke Holder’s inequality (5.1) for the trace with g - p and g' = p/{p-l): 

E^P<(2p-l)-{trVPY^P-E{tiX^Py~^^''' 

< (2p - 1) • alp • (EtrX^P)^'’””''’ = (2p - 1) • • E^CP-D. 

We have identified the quantity a 2 p from (7.1). The second inequality is Lyapunov’s. Since the unknown E is 
nonnegative, we can solve the polynomial inequality to reach 

E< y/2p-l-a2p. 

This is the required result. 


8. A Second-Order Matrix Khintchine Inequality 

In this Section, we prove Theorem 3.5, the second-order matrix Khintchine inequality. Let us restate the result 
in the form that we will establish it. 


Theorem 8.1 (Second-Order Matrix Khintchine). Let X = JiHi he an Hermitian matrix Gaussian series, as 
in [2.1]. Define the matrix variance and standard deviation parameter 

and CT 2 p := (trPP)^^*^^^ forp>l. (8.1) 


Define the matrix alignment parameter 


W 2 p := max tr HiQiHjQ 2 HiQ 3 Hj 

Qe \ \ 


|p/2Ud2p) 


for p>l 


( 8 . 2 ) 


where the maximum ranges over a triple {Qi,Q 2 , Qs) of unitary matrices. Then, for each integer p > 3, 

<3^2p-5-(72p -I- \/2p-4 - it'2p- 


(8.3) 


The proof of Theorem 8.1 will occupy us for the rest of the section. To reach the statement in the introduction, we 
rewrite traces in terms of Schatten norms. We also provide the proof of Proposition 3.2 in Section 8.9. 
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8.1. Discussion. Before we establish Theorem 8.1, let us spend a moment to discuss the proof of this result. Theo¬ 
rem 8.1 is based on the same pattern of argument as the matrix BChintchine inequality, Proposition 7.1. This time, 
we apply Proposition 7.3 more surgically to control the terms in the trace moment identity from Lemma 6.1. The 
most significant new observation is that we can use complex interpolation to reorganize the products of matrices 
that arise during the calculation. 

We can refine this argument in several ways. First, if we apply complex interpolation with more care, it is possi¬ 
ble to define the matrix alignment parameter (8.2) as a maximum over the set 

{Qi,Q 2,Q3 are commuting unitaries and Q( = I for some 

Given that commuting matrices are simultaneously diagonalizable, this improvement might make it easier to 
bound the matrix alignment parameters. 

Second, it is quite clear from the proof that we can proceed beyond the second-order terms. For example, for 
an integer p > 3, we can obtain results in terms of the third-order quantities 

{ Iv-n |p/3^1d2p) 

W 2 p,i max tr , HiQiHjQzHtQsHiQiHjQsHA 
Q£ V ' ' 

W 2 p ,2 := max ftrl^” . HiQiHjQ2HkQ3HiQ4HkQ5Hjf 
Qt \ I ’J’ I 

The ordering of indices is {i,j, k, i,j, k) and [i,], k, i, k,j), respectively. This refinement allows us reduce the order 

of coefficient on the standard deviation term a 2 p in (8.3) to Unfortunately, we must also compute both 

alignment parameters W 2 p,i and W 2 p, 2 , instead of just W 2 p- This observation shows why it is unproductive to press 

forward with this approach. Indeed, the number of orderings of indices grows super-exponentially as we consider 

longer products, which is an awful prospect for applications. 

8.2. Preliminaries. In the proof of Theorem 8.1, we will use two interpolation results to reorganize products of 
matrices. The first one is a type of matrix FI older inequality [LP86, Cor. 1]. Here is a version of the result specialized 
to our setting. 

Fact 8.2 (Lust-Piquard). Consider a finite sequence [Ai,...,An) of Hermitian matrices with the same dimension, 
and let B be a positive-semidefinite matrix of the same dimension. For each number p > 2, 

See [PX97, Lem. 1.1] for a proof based on the Hadamard Three-Lines Theorem [Gar07, Prop. 9.1.1]. 

The second result is a more complicated interpolation for a multilinear function whose arguments are powers 
of random matrices. 

Proposition 8.3 (Multilinear Interpolation). Suppose that F : (Mlrf)*^ C is a multilinear function. Fix nonnega¬ 
tive integers a\,...,a)c with oti - a. Let Yi e H,/ be random matrices, not necessarily independent, for which 
E|| F;ll“ < oo. Then 

|EF(F“', ..., max Einax|F(Qi, ..., Qi_i, QiY^, Qi^i, ..., Qk)\. 

1=1,...,k Qf 

In this expression, each Q( is a (random) unitary matrix that commutes with Y(. 

As with Fact 8.2, the proof of Proposition 8.3 depends on the Hadamard Three-Lines Theorem [Gar07, Prop. 9.1.1]. 
The argument is standard but somewhat involved, so we postpone the details to Appendix A. 

8.3. The Overture. Let us commence with the proof of Theorem 8.1 . The initial steps are similar with the argument 
that leads to the matrix Khlntchine Inequality, Proposition 7.1. Introduce notation for the quantity of interest; 

:= EtrX^P = E^=o^E"=i(8.4) 

The identity follows from the integration by parts result. Lemma 6.1. 

This time, we make finer estimates for the summands in (8.4). Apply Proposition 7.3 with s = 0 to the terms 
where q e {0,l,2p-3,2p-2\. For the remaining 2p-5 values of the exponent q, apply Proposition 7.3 with s = 1. 
We reach the bound 

E^P <4E"=i Etr[Hfx^P~^] + (2p-5 )ELi Etr[HiX^HiX^P~'^]. (8.5) 

We can take advantage of the fact that the Hi are interleaved with the powers X^ of the random matrix In the 
second term. 


|l/(2p) 
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8.4. The First Term. To treat the first term on the right-hand side of (8.5), simply repeat the arguments from Sec¬ 
tion 7.2 to obtain a hound in terms of the quantity E. We have 

=Etr[FX2P“2] < (tr(8.6) 

The quantities V and azp are defined in (8.1), and we have identified a copy of E. 


8.5. Integration by Parts, Again. To continue, we want to break down the matrix that appears in the second 
term on the right-hand side of (8.5). To do so, we perform another Gaussian integration by parts. Write - 
TjHjX, and invoke Fact 6.2 to obtain 


yEtr [HiX^HiX^P-^] = E [T; ' tr [HiHjXHiX^P-^]] 

= E"j=i Etr [HiHjHiX^P-^] + ^Etr [HiHjXHi X’’HjX^P-^-’-]]. (8.7) 

This result follows from the product rule and the formula (6.2) for the derivative of a power. We will bound the first 
term on the right-hand side of (8.7) in terms of the standard deviation parameter a 2 p, and the second term will 
lead to the matrix alignment parameter wzp. 


8.6. Finding the Standard Deviation Parameter. Let us address the first term on the right-hand side of (8.7). First, 
draw the sum back into the trace and identify the matrix variance V, defined in (8.1): 

Eti [HiHjHiX^P-^] = Etr[(y”^j HiVHi) X^'-P-^^ . 

To isolate the random matrix X, apply Holder’s inequality (5.1) with exponents g - pl2 and g' - pi ip - 2), and 
follow up with Lyapunov’s inequality. Thus, 

Etr[(i:”=i HiVHi)X^^P-^^] < (tr(y”^^ HiVHi)”'^]"'^ ■ [EtiX^Pf^’-^^^”. 

The Lust-Piquard inequality. Fact 8.2, with g = p implies that 

ffl VH.ff = {triZI., Hfff"' ■ {uvn''" = (tr VOf = t*,. 

Once again, we identified V and a 2 p from (8.1). Combine the last three displays to arrive at 

Etr [HiHjHiX^P-^] < ■ (EtrA^^j'^'^’^P = a% ■ (8.8) 

We have identified another copy of E. 


8.7. Finding the Matrix Alignment Parameter. It remains to study the second term on the right-hand side of (8.7). 
Rearranging the sums, we write this object as 

We can apply the interpolation result. Proposition 8.3, to consolidate the powers of the random matrix X. Consider 
the multilinear function 

F(Ai,A2,A3) Y.lj=i^'^[HiHjAiHiA2HjAs]. 

Since X is a matrix Gaussian series, it has moments of all orders. Therefore, for each index r, 

|Etr [HiHjXHiX’'HjX^P-^-’'] \ 

< maxiEmax|F(QiX2P-4, Q 2 , Q 3 )|, Emax|F(Qi, QzX^P-^, Q 3 )|, Emax|F(Q i, Q 2 , QsX^P-^)11. 

1 Qe Q( Qt ] 
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All three terms in the maximum admit the same bound, so we may as well consider the third one: 


Emax|F(Qi,Q2,Q3X2P-4)| =Emax|y” tr[HiH,-QiH,-Q2H,-Q3X2<P-2J]| 

= Em^jtr [(y " Q 3 HiHjQiHiQ 2 Hj] 


< Emax 

Qt 




< . HiHjQiHiQ2Hj\'' 

Q£ \ \ ’J I 


p-(EtrJE2P)'P-2)/P 


The first step is the definition of F. To reach the second line, we use the fact that Q 3 commutes with X, then we 
cycle the trace. The third line is Holder’s inequality (5.1) with g = p/2 and g' = pl{p- 2), and we have used the 
left unitary invariance of the matrix absolute value to delete Q 3 . Next, take the maximum over all unitary matrices, 
and apply Lyapunov’s inequality to draw the expectation into the term involving X. Finally, identify the quantity E 
and note that the maximum is bounded by the alignment parameter w^p, defined in (8.2) . Similar calculations are 
valid for the other two terms, whence 


\Etr[HiHjXHiX’'HjX^P~^~'']\ < w^p-E^^P~^\ 

Since there are 2p - 4 possible choices of r, we determine that 

Er=o^E";=iEtr[Hif/,-XHiX''H,-x2P-5-''] < (2p-4) ■ (g.S) 

The main part of the argument is finished. 


8.8. Putting the Pieces Together. To conclude, we merge the bounds we have obtained and solve the resulting 
inequality for the quantity E. Combine (8.5), (8.6), (8.7), (8.8), and (8.9) to reach 

E^P < 4alp ■ £2(p- 1) + (2p - 5) ■ + (2p - 4) ■ ■ £’2(P-2)_ 

Clearing factors of E, we reach the inequality 

E* <4alp-E^ + {2p-5]-^(jjp + {2p-4)-w^pY 

If a and p are nonnegative numbers, each nonnegative solution to the quadratic inequality < at + p must satisfy 
t<a + y/p. It follows that 

^ 4CT2p + y/2p-5- [cr^p + (2p-4) ■ • 

Take the square root, and invoke subadditivity of the square root (twice) to reach 

£■ < (2 + ^2p - 5] ■ (72p + ^(2p - 5)(2p - 4) ■ W2p. 

Finally, we simplify the numerical constants to arrive at (8.3). 


8.9. Comparison of Standard Deviation and Alignment Parameters. Our last task in this section is to establish 
Proposition 3.2, which states that the alignment parameter W 2 p never exceeds the standard deviation a 2 p. The 
easiest way to obtain this result is to use block matrices and inequalities for the Schatten norm. 

Fix an integer p > 2, and fix a triple (Qi, Q 2 . Qs) of unitary matrices. Consider the quantity 

( lv-« |P/2)2^P 

S-[^^\Llj=iHiQiHjQ2HtQ2Hj\ J . 


To establish Proposition 3.2, it suffices to show that S < u^p- Using block matrices and converting the trace into a 
Schatten norm, we can write 


S = 


Q^HjQlHi 


HiQsHj 


p/2 
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The entries of the block column matrices are indexed by pairs [ij), arranged in lexicographic order. Invoke the 
Cauchy-Schwarz inequality (5.2) for Schatten norms with g - p/2: 


s^< 






p/2 


HiQsHj 


HiQsHj 


p/2 


Write each product of two block matrices as a sum: 

The two factors have the same form, so it suffices to bound the first one. Indeed, 


a 


2p- 


We have identified the matrix variance V, defined in (8.1), and then we applied the Lust-Piquard inequality. Fact 8.2, 
with Q-p. We identified Y again, invoked unitary invariance of the Schatten norm, and then we recognized the 
quantity azp from (8.1). In summary, we have established that This is what we needed to show. 


9. Second-Order Matrix Khintchine under Strong Isotropy 


In this section, we prove an extension of Theorem 3.11 that gives both lower and upper bounds for the trace 
moments of a strongly isotropic matrix Gaussian series. 

Theorem 9.1 (Second-OrderMatrixKhintchine under Strong Isotropy). LetX := be an Hermitian matrix 

Gaussian series, as in (2.1). Assume thatX has the strong isotropy property 

EX^’- [EtiX^)-! for each p - 0,1,2, _ (9.1) 

Define the matrix standard deviation parameter and matrix alignment parameter 

M9 II l|l/4 

and w.^maxjY," .-^HiQiHjQzHiQsHjj . (9.2) 

Qf II ’J II 

The maximum ranges over a triple [Qi,Q 2 ,Qfj of unitary matrices. Then, for each integer p > 1, 

Cat^'‘'P^a-[l - (pui//7)4]y‘"'” < ■ a + p^<^ ■ w. 

The lower bound also requires that p^^^w < 0.7a. We have written Catp for the pth Catalan number, the function 
[a] + max{a,0}, and It is the normalized trace. 

The proof of this result appears below, starting in Section 9.3. To reach the statement of Theorem 3.11 in the 
introduction, we rewrite normalized traces in terms of Schatten norms. Fact 9.2 (below) states that the Catalan 
numbers satisfy the bound Catp < 4^ for each p- 1,2,3,..., which gives an explicit numerical form for the upper 
bound. 


9.1. Discussion. Before we establish Theorem 9.1, let us comment on the proof and the meaning of this result. The 
most important observation is that the estimate is extremely accurate, at least for some examples. In particular, for 
the COE matrix Xgoe defined in (2.7), we showed in Section 4.2 that the standard deviation parameter cr =; 1 while 
the alignment parameter w ^ Therefore, Theorem 9.1 implies that 

EtrXg^e ~ Catp when p « 

This estimate is sufficient to prove that the limiting spectral distribution of the COE is the semicircle law. See [Tao 12 , 
Sec. 2.3] for details about how to derive the law from the trace moments. Furthermore, Markov’s inequality implies 
that the norm ||Xgoe II ~ 2 with high probability. 

The proof of Theorem 9.1 has a lot in common with the arguments leading up to Proposition 7.1 and Theo¬ 
rem 8.1. The main innovation is that we can use the strong isotropy to imitate a moment identity that would hold 
in free probability. This idea allows us to remove the dependence on p from the standard deviation term. 

Although it may seem that the proof requires the matrix X to be a Gaussian series, there are analogous tech¬ 
niques, based on the theory of exchangeable pairs [MJC'''14], that allow us to deal with other types of random 
matrix series. This observation has the potential to lead to universality laws. It is also clear from the argument that 
we could prove related results with an approximate form of strong isotropy. 
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In addition, it is possible to extend these ideas to a rectangular matrix Gaussian series Z ;= JiSi e 
In this case, we consider the Hermitian dilation: 




0 

z* 


z 

0 


The correct analog of strong isotropy is that 


EM’iZff’ = 


(Efr(ZZ*)P)-I 

0 


0 

(Etr(Z*Z)P)-I 


for p = 0 , 1 , 2 ,.... 


This observation allows us to obtain sharp bounds for the trace moments of rectangular Gaussian matrices. In this 
fashion, we can even show that the limiting spectral density of a sequence of rectangular Gaussian matrices is the 
Marcenko-Pastur distribution, provided that the aspect ratio of the sequence is held constant. 

Finally, we remark that similar arguments can he applied to obtain algebraic relations for the Stieltjes trans¬ 
form of the matrix X. This approach may lead more directly to limit laws for sequences of random matrices with 
increasing dimension. See [AGZIO, Sec. 2.4.1] or [Keml3, Sec. 9] for an argument of this species. 


9.2. Preliminaries. Aside from the results we have collected so far, the proof of Theorem 9.1 requires a few addi¬ 
tional ingredients. First, we state some of the basic and well-known properties of the Catalan numbers. 


Fact 9.2 (Catalan Numbers). The pth Catalan number is defined by the formula 

1 


Catp := 


p-i-1 


2p 

\PI 


forp-0,1,2,.... 


(9.3) 


In particular, p ^ Catp is nondecreasing, and Catp < 4P for each p. The Catalan numbers satisfy the recursion 

Cato = l and Catp+i = Cat^Catp.p. (9.4) 

The next result is a covariance identity for a product of centered functions of a Gaussian vector [NP12 , Thm. 2.9.1]. 
It can be regarded as a refinement of the Poincare inequality, which provides a bound for the variance of a centered 
function of a Gaussian vector. 


Fact 9.3 (Gaussian Covariance Identity). Let 7 , 7 ' e IR” be independent standard normal vectors. Let f,g:K"—'C 
be functions whose derivatives are square integrable with respect to the standard normal measure, and assume that 
E/Cy) = Egly) = 0. Then 

poo - 

E[/(r)-S'(r)] = ■^Z"=iE[(5;/)(y)-(djg)(yt)] where yj e“V+v 1 

The symbol dj refers to differentiation with respect to the f th coordinate. 

The usual statement of this result involves the Ornstein-Uhlenbeck semigroup, but we have given a more elemen¬ 
tary formulation. 

Finally, we need a bound for the solution to a certain type of polynomial inequality. This estimate is related to 
Fujiwara’s inequality [Mar 66 , Sec. 27]. We include a proof sketch since we could not locate the precise statement 
in the literature. 


Proposition 9.4 (Polynomial Inequalities). Consider an integer fc > 3, and fix positive numbers a and f. Then 

u*^<a + fu^~^ implies + 

Proof Sketch. Consider the polynomial cp : u ^ u^ - fu^~^ - a. The Descartes Rule of Signs implies that q) has 
exactly one positive root, say u+. Furthermore, q){u) < 0 for a positive number u if and only if m < u+ because 
(p(0) < 0. By direct calculation, one may verify that w* = satisfies (/)(m*) > 0, which means that u+ <Ui,. 

We conclude that q){u) <0 implies u< u+ < Ui^. □ 
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9.3. The Normalized Trace Moments. Let us commence the proof of Theorem 9.1. First, we introduce notation 
for the normalized trace moments of the matrix Gaussian series: 

jdp :=Etr:XP for p = 0,1,2,.... 

It is clear that po = 1. Since X is a symmetric random variable, the odd trace moments are zero: 

P 2 p+i = 0 for p = 0,1,2,.... 

It remains to calculate the even trace moments. 

We can obtain the second moment from a simple argument: 

Hf = EX2 = (EfrX^) ■ I = p 2 ■ I. (9.5) 

The first identity follows from a direct calculation using the definition (2.1) of the matrix Gaussian series. The 
second identity is the strong isotropy hypothesis (9.1), and the last relation is the definition of p 2 - Take the spectral 
norm of (9.5) to see that 

= (9.6) 

We have identified the standard deviation parameter, defined in (9.2). 

9.4. Representationof Higher-Order Moments. The major challenge is to compute the rest of the even moments. 
As usual, the first step is to invoke Gaussian integration by parts. For each integer p > 1, Lemma 6.1 implies that 

M2(p+1) = iXo Lh Etr 

We are considering P 2 {p+i) instead of p 2 p because it makes the argument cleaner. To analyze this expression, we 
win examine each index q separately and subject each one to the same treatment. 

Fix an index 0< q < 2p. First, we center both and X^P“4 by adding and subtracting their expectations: 

Efr [HiX‘f HiX^P-‘f] = X"=i tr [Hi{EX‘f)Hi{EX^P-‘’)] 

+ E”=i Etr [Hi{X‘l - EX‘l)Hi{X^P-‘> - EX^P-^)]. 

The cross-terms vanish because each one has zero mean. It is productive to think of the first sum on the right-hand 
side as an approximation to the left-hand side, while the second sum is a perturbation. 

Let us focus on the first sum on the right-hand side of the last display. We can use the strong isotropy hypothe¬ 
sis (9.1) to simplify this expression: 

^"^jfr[Hi(EX^)Hi(EX2P-4)]=^”^jfr[Hi((EfrX‘?)-l)Hi((EfrX2P-4).l)] 

= Hf) {EtTX‘^){EtrX^P~‘^) 

2 

— O' • flqfl2p-q- 

The last identity follows from (9.5) and (9.6). As a side note, our motivation here is to imitate the moment identity 
that would hold if X and Hi were free from each other, in the sense of free probability. 

Finally, we combine the last three displays to reach 

fi2(p+i) =a2.^P^„P2^P2(p_^) +^JP-^^;ijEfr[Hi(XP-EX'?)fTi(x2P-4-EX2P-4)]. (9.7) 

Observe that we have modified the indexing of both sums. This step depends on the facts that fiq-O for odd q and 
that X“ = I. 

9.5. The Perturbation Term. The next step in the argument is to bound the perturbation term in (9.7) in terms of 
the alignment parameter w, defined in (9.2). We will use the Gaussian covariance identity. Fact 9.3. 

To that end, let us explain how to write each summand in the perturbation term as a covariance. Let if be a real, 
diagonal matrix: H = diag(/ii,..., h^). Expanding the normalized trace, using a, j3 for coordinate indices, we find 
that 

Efr [H{X‘^ - EX‘^)H{X^P-‘^ - EX^P-P)] = i ' E [(X^ - EX^)„^(x2P-4 - EX^P-P)^^]. 

To apply the Gaussian covariance identity to the expectation, we introduce a parameterized family {X(: f > 0} of 
random matrices where 

Vl-e^j'l^) ■ Hie where y' is an independent copy of y. 
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Observe that X and Xt have the same distribution, although they are dependent. Fact 9.3 and Fact 6.3 deliver 


-f 

/»( 

_ Y"^-i Y-2p-^-i I 

” Z^r=0 z^5=0 I 
Jo 




^ dr, 
e^ 


Combining these formulas and expressing the result in terms of the normalized trace again, we find that 
Efr [H[X‘^ - - EX^P”'?)] 

= El'd Ed=o r § E;=i Etr . 

JO C ■' 

In fact, this expression is valid for any Hermitian matrix H because of the unitary invariance of the trace. Summing 
the last identity over H- Hi, we reach 

Efr [Hi[X‘^ - '^X‘^)Hi[X^P~‘^ - EX^P”^)] 

= EEd Ed=o r ^ Ed/=i Etr lHiX^HjX‘^-^-^HiX^HjXfP-^-^-% (9.8) 
Jo G 

At this point, the alignment parameter w starts to become visible. 


9.6. Finding the Matrix Alignment Parameter. Our next goal is to control the expression (9.8) in terms of the 
alignment parameter w. To do so, we use the interpolation result. Proposition 8.3, to bound the sum over [i, j). For 
each choice of indices [q, r, s), we obtain the estimate 

\Elj=i Efr [HiX'-HyX^-i-'-ffiX^H^-xf | < max{Ein^|y”^.^^ fr [HiQiX^P-^HjQ2HiQ3HjQi] |, 

Emax |y " fr [HiQiHjQ2X^P-^HiQ3HjQi] I, 

Q{ I ’J I 

Emax |y” ti[HiQiHjQ2HiQ3XfP ^HjQi] I, 

Ein^|y" tr[HiQiHjQ2HiQ3HjQaT~^] \ }■ 

Each random unitary matrix Qf commutes with the corresponding random matrix X or Xf. 

As in Section 8.7, we can bound each term in the maximum in the same fashion. For example, consider the 
fourth term: 

Em^ |Edi=i tr [Hi QiHj QzH,- Q^Hj \ < Eni^ [ || Zlj=i Hi QiHj Q2Hi Q^Hj || ■ (frxf"") ] 

^ max II y ” Hi QiHj QsfF; QsHj || ■ (EfrXf"") 

Qe 

= W^-P2(p-1)- 

The first step is Holder’s inequality for the trace, and the second step is Holder’s inequality for the expectation. In 
the last line, we recall that Xt has the same distribution as X to identify p 2 (p-i)- Finally, we recognize the matrix 
alignment parameter w, defined in (9.2). 

In summary, we have shown that 

\Zlj^,Etr[HiX'-HjX‘^-^-’'HiXfHjX^f^-‘^-^-^]\< w*-q2ip-i). 

Introduce this bound into (9.8) to arrive at 

|y”^j Efr [H;(X^ - EX'?)H;(X2P-^ - EX^P-^)] | < q(.2p-q) ■ ■ p 2 (p-i) < ■ p 2 ip-i) ■ 

We have used the numerical inequality u[a - u) < a^l4, valid for u eR. Finally, we sum this expression over the 
index q to conclude that 

\LT=o' ELi Efr [Hi[XP - EXP]Hi{X^P-‘^ - EX^P"^)] \<2p^- w^-p2ip-p. 

This is the required bound for the perturbation term in (9.7). 


(9.9) 
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9.7. A Recursion for the Trace Moments. In view of (9.7) and (9.9), we have shown that 

^l 2 {p+l) = ± 2p^ ■ W* ■ ^ 2 {p-Y) for p = 1,2,3,.... (9.10) 

We have written + to indicate that the expression contains both a lower bound and an upper bound for the nor¬ 
malized trace moment pzip+i) ■ 

In the next two sections, we will solve this recursion to obtain explicit bounds on the trace moments. First, we 
obtain the upper bound 

^in2p) ^^ati'(2p) _^_^2i/4p5/4. forp = 1,2,3,.... (9.11) 

This result gives us a Khintchine-type inequality. Afterward, assuming w < O.Tcr, we establish the lower bound 

^iq2p) ^Cat^''2P'.o--[l-(pw/(7)'‘]y‘^'’^ for p= 1,2,3,.... (9.12) 

Together these estimates yield the statement of Theorem 9.1. 


9.8. Solving the Recursion: Upper Bound. We begin with the proof of the upper bound (9.11). The first step in 
the argument is to remove the lag term P 2 (p-i) from the recursion (9.10) using moment comparison. Fix an integer 
p > 1. Observe that 

P2(p-i) = EtrX2(P-i) < 

The first inequality holds because q ^ (tr is increasing for any positive-semidefinite matrix A, while the 

second inequality is Lyapunov’s. Introduce this estimate into the recursion (9.10) to obtain 

M2(p+1)^CT ■^^^^p2qlJ'2{p-q)+2p W ■M2(p+1) 

This is a polynomial inequality of the form < a + puP~^, so Proposition 9.4 ensures that u < Qjidp+i) +ygi/2 j,, 

other words, 


(^■^•E^=oA^ 2 (,M 2 (p-q)] ^ + s/2p^'^-w^ forp= 1,2,3,.... 

Using this formula, we will apply induction to prove that 


i/(P+i) < I 
P2(p+1) - 


< Catp + p^^^ ■ when p = 1,2,3,.... 


(9.13) 


(9.14) 


The stated result (9.11) follows from (9.14) once we take the square root and invoke subadditivity. 

Let us commence the induction. The formula (9.14) holds for p = 1 because p 2 = cr^, as noted in (9.6). Assuming 
that the bound (9.14) holds for each integer p in the range 1,2,3,..., r, we will verify that the same bound is also 
valid for p- r + \. For any integer q in the range l< q < r, the bound (9.14) implies that 


P 2 q ^ Catcf ■ a 


2q _ 


\/2q 


5/2 w2\ 


Cat-V2 


< Catj, ■ ■ exp 


\/2q 


,7/2 ^2 \ 


(9.15) 


Using the definition (9.3) of the Catalan numbers, one may verify that q ^ is increasing, so 




P2q £ Catq ■ a 


2q. 


1-t 




Cat 


l/(r+l )„2 


for <7 = 0, 1 , 2 ,..., r. 


The case q - 0 follows by inspection. Now, the latter bound and the recursion (9.4) for Catalan numbers together 
imply that 


• E«=0 ^^2q^^2{r-q) < Catr+1 ■ CT 


2(r+l) , 


1-t 


s/2{r+\f''^w'^' 




Take the r -i-1 root to determine that 

(^^■Eq=oM2?M2(r-^)) 


l/(r+l) 


< Cat 


l/(r+l) 

r+1 


a 


1-t- 


r+l 


N/2(r-H)5'2jy2' 


= CatJ;^‘[’"^’ • -t \/2 (r H- 1 )^'^ uP--ij2{r+ w^. 


We have used the numerical inequality (1 -i- jc)“ < 1 -i- ax, valid for 0 < a < 1 and x > 0. Combine this estimate with 
the recursive bound (9.13) ior p- r to obtain 

< CatJ^f« + V2ir+lf^w\ 
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We see that (9.14) holds for p = r + 1, and the induction may proceed. 

9.9. Solving the Recursion: Lower Bound. We turn to the proof of the lower bound (9.12). Assuming that < 

0.7(1, we will use induction to show that 

P 2 p > (7^^-Catp ■ [l - (pw/o)^]^ for p = 0,1,2,3,.... (9.16) 

The result (9.12) follows once we take the (2p)th root. 

To begin the induction, recall that po = 1> so the formula (9.16) is valid for p = 0. Suppose now that (9.16) is 
valid for each integer p in the range 0,1,2,..., r. We will verify the formula for p = r +1. The lower branch of the 
recursion (9.10) states that 

The induction hypothesis (9.16) yields 

■ E^=o M 2 (?A' 2 (r-< 7 ) ^ • Y.q=o Cat^Catr-p -[l-iqu/laf- ((r - q) w/af] 

-irwIcFf. 


We have used the fact that q ^ q"^ + [r- q]'^ achieves its maximum value on [0, r] at one of the endpoints because 
of convexity. We also applied the recursive formula (9.4) for the Catalan numbers. The bound (9.15) implies that 


P2(r-1) <2Cat,._icr' 


2(r-l) 


when 




a 


log2 

V2 


;0.7. 


Therefore, 

2r^ w'‘p2(r-i) < 4Catr-i ■ < 4Catr-n • • r'^{wla)^. 

The second inequality holds because the Catalan numbers are nondecreasing. Combine the last three displays to 
arrive at 

P2(r-H) > CaWi • [\-[rUAr^]{wla)^] > Cat^-n ■ • [\- {r + \)\w ! 

We have verified the formula (9.16) for p = r -i-1, which completes the proof. 


Appendix A. Interpolation Results 

In this appendix, we establish Proposition 8.3, the interpolation inequality for a multilinear function of a ran¬ 
dom matrix, whose proof appears below in Appendix A.4. 

A.l. Multivariate Complex Interpolation. The interpolation result we use in the body of the paper is a conse¬ 
quence of a more general theorem on interpolation for a function of several complex variables. 

Proposition A. 1 (Multivariate Complex Interpolation). Let k tea natural number. For a positive number a, define 
the simplicial prism 

Afclo:) := |(ci,..., cj;) e : Rec/ > 0 for each i Rec; < a|. 

Consider a bounded, continuous function G : Afc(a) ^ C. For each pair {i,j) of distinct indices and each c e Akia), 
assume that G has the analytic section property: 

G{ci, Ci + z, Cj - z, Ck) is analytic on - Re c/ < Re z < Re Cj. (A. 1) 

Then, for each z e Aj;(q:) with f Rezj > 0, 

|G(zi, ..., Zj;)|< supf^gR |G(ifi, ..., ir,--i, f + iti, if/+i, ..., irfc)|^''^'] (A.2) 

We establish Proposition A.l in the next two sections. The argument relies on the same principles that support 
standard univariate complex interpolation. Although it seems likely that a result of this form already appears in 
the literature, we were not able to locate a reference. 
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A.2. Preliminaries. Proposition A. 1 depends on Hadamard’s Three-Lines Theorem [Gar07, Prop. 9.1.1]. 
Proposition A.2 (Three-Lines Theorem). Consider the vertical strip Ai (1) in the complex plane: 

Aid) := {zEC:0<Rez< 1}. 

Consider a bounded, continuous function /: Ai (1) ^ C that is analytic in the interior o/Ai (1). For each 6 e [0,1], 

sup |/(0 + if)|<sup |/(l + if)rsup |/(if)li-®. 

leK t£R IeK 

As we will see, this result delivers the fc = 2 case of Proposition A. 1 . 

A.3. Proof of Proposition A. 1. The proof of the multivariate interpolation result, Proposition A. 1, follows by induc¬ 
tion on the number k of arguments. 

Let us begin with the base cases. When the function G has one argument only, the inequality (A.2) is obviously 
true. Next, consider a bivariate function Gz : A 2 (a) ^ C that is bounded and continuous and has the analytic 
section property (A.l). Fix a point z e Azioc) with p := Rezi - 1 - Rez 2 > 0. Define the bounded, continuous function 

/(y) := GziPy + ilmzi, ; 6 (l-y)-i-ilmz 2 ) fory e Ai(l). 

The assumption (A.l) implies that/is analytic onO < Rey < 1. Select 6 = Rezilf, which gives 1-0 = Rez 2 //- An 
application of the Three-Lines Theorem, Proposition A.2, implies that 

IG 2 (zi, Z 2 )l = |/(0)| <SUPfg^ |/(l-tit)|®-SUPf£R |/(it)|^“®. 

Introducing the definition of / and simplifying, 

|G 2 (zi, Z 2 )|<SUP(£„ |G 2 (/(l-i-ir)-i-iImzi, -)Sir-i-iImz 2 )|^®^'^^ 

xsupfgK |G 2 ()Sit-i-iImzi, /(I - if)- 1 -ilmz 2 )|®®^^^^ 

^(suPsi.saeR |G2(/+ i^l, ' SUp,^ | G 2 (Wj, /3-t iS 2 )(A.3) 

This is the k-2 case of Proposition A. 1 . 

Fix a positive integer k, and suppose that we have established the inequality (A.2) for functions with A: - 1 argu¬ 
ments. In other words, assume that Gfc_i: Aj;_i (a') ^ C is bounded and continuous, and it has the analytic section 
property (A.l). Then, for z e Afc_i(a') with Zfr/Rez; > 0, 

1/^/ 

|Gfc_i(zi, ..., Zfc_i)| < (nfji supj^gR |Gfc_i(ifi, ..., p' + iti, ..., iffc_i)|^''^‘j . (A.4) 

We need to extend this result to functions with k variables. 

Consider a bounded, continuous function Gfc ; Ajt (a) ^ C with the analytic section property (A. 1) . Fix a complex 
vector z e Afc(a:) with p ;= Rez,- > 0. Define the number /' := Zfr/Rezi. When p' - 0, the formula (A.2) is 
trivial for G= Gk because Rezi = ■ • ■ = Rezfc_i = 0. Therefore, we may assume that /' > 0. Introduce the function 

Gfc_i(yi, ..., yfc_i) := Gfc(yi, ..., yfc_i, z^) for y e Afc_i(/'). 

One may verify that Gfc_i inherits boundedness, continuity, and analytic sections from Gfc. Therefore, the induc¬ 
tion hypothesis (A.4) gives 

1 / 

|Gfc(zi, ..., Zk-i, (nfri suPqeR |Gfc(ifi, ..., P' + iti, ..., ifj;_i, Zfc)|'*®^‘] . (A.5) 

For each fixed choice of the index i and of the numbers q,..., fj;_i e IR, consider the function 
G 2 (yi, yfc):=Gfc(ifi, ..., y/, ..., iffc-i, yk) for (y,,yfc) e A 2 (a). 

Since - 1 - Re zj; = p, the bivariate case (A.3) provides that 

|Gfc(ifi, ..., P + iti, ..., itk-i, Zfc)| < sup^. jj^gK |Gfc(ifi, ..., P + iSi, ..., iffc_i, isfc)|^^^ 

supj._^j^£R |Gfc(ifi, ..., is,-, ..., iffc_i, /-tisfc)|^“^^^ 


(A.6) 
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Combine the bounds (A.5) and (A.6) to reach 

\Gk{zi, Zk-i, Zk]\< (nfri suPt^eR |Gfc(ifi. + 

''llL-l SUPt^eH |Gfc(lti, Iti, ..., P + ltk]\ ‘J 

Since fi' - Z Jl/ Re z,- and fi- p’ - Re z^, we see that the second product has the same form as the i - k term of the 
first product. Thus, 

|Gfc(zi, Zfc)| < (nf=iSuPf^eR |Gfc(iti, ;6 + ir,-, ...,iffc)|''®^'] ^. 

This step completes the induction. We have established Proposition A. 1. 

A.4. Interpolation for a Multilinear Function of Random Matrices. We are now prepared to establish the interpo¬ 
lation result, Proposition 8.3, for a multilinear function of random matrices. We will actually establish a somewhat 
more precise version, which we state here. 

Proposition A.3 (Refined Multilinear Interpolation). Suppose that F : (IVIld)*^ ^ C is a multilinear function. Fix 
nonnegative integers a\,...,ak with = o:- Let Yi e be random Hermitian matrices, not necessarily inde¬ 

pendent, for which E || F, || “ < oo. Then 

c)i ^ (nti [Em^i^(Qi. Q.'-i- QiYr, q^)! 

Each Q( is a unitary matrix that commutes with Y(. 

We establish this result in the next section. 

Observe that Proposition A.3 immediately implies Proposition 8.3, the interpolation result that we use in the 
body of the paper. Indeed, we recognize the large parenthesis on the right-hand side as a geometric mean, and we 
bound the geometric mean by the maximum of its components. 

A.5. Proof of Proposition A.3. By a perturbative argument, we may assume that each matrix F/ is almost surely 
nonsingular. Indeed, for a parameter e > 0, we can replace each F,- by the modified matrix F,- := F/ -t £7(1, where 
{ji] is an independent family of standard normal variables. After completing the argument, we can draw e down 
to zero to obtain the inequality for the original random matrices. 

The first step in the argument is to perform a polar factorization of each random Hermitian matrix: F,- = UiPi 
where Ui is unitary. Pi is almost surely positive definite, and the two factors commute with F/ for each index i. For 
clarity of argument, we introduce the unitary matrices S/ = . With this notation, 

|ET(F“\ ..., F“‘^)| = |EF(SiP“', ..., (A.7) 

We wiU perform interpolation only on the positive-definite matrices. 

Next, we introduce a complex-valued function by replacing the powers a/ with complex variables: 

G:(zi,...,Zfc)-ET(SiPf, ..., SfcPf) forzeAfc(a). 

The set Aj;(a) is the simplicial prism defined in the statement of Proposition A.l. 

Claim A.4. The function G is bounded, continuous, and has analytic sections. 

These are the properties required to apply the interpolation result, Proposition A. 1 . 

Let us assume that Claim A.4 holds so that we can complete the proof. The relation (A.7) and Proposition A.l 
imply that 

|EP(F“i, ..., F“‘^)| = |EF(SlP“^ ..., 

Fix an index i in the product. Introduce the unitary matrix Quiti) - SiP'^' 117“, where Ut is the polar factor of F,-. 
It follows that 

^,pa+ifi ^ ^SiPfU7“)[U“Pl) = Qii(ti)Y“ 
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Similarly, for each £ ^ i, we can define Qi({t() - . Therefore, 


|EF(F“i, ..., Y^^)\< ntiSup|E-P(Qa(h), QikWT' 

\ t£€U 


1/a 




l/a 


By construction Qifitf] commutes with Y^ for each index £. In the second line, we apply Jensen’s inequality. Then 
we relax the supremum to include all unitary matrices Q( that commute with the corresponding Y(. We can replace 
the supremum with a maximum since the unitary group is compact and the function F is continuous. This is what 
we needed to show. 

Finally, we must verify Claim A.4. Each multilinear function F: (M^) ^ ^ C is hounded and continuous: 

|F(Ai, ..., Afc)|<Const-nf=illA-|l- 

Fix a point z e Afc(a), and let P Rez/. Applying this observation to the function G, 


|G(zi,...,Zfcll = |EF(SiPf, ..., SfcPf)I < Const• E [ HL WSiPp II | 

= Const-E[nti II 

The first estimate follows from Jensen’s inequality and the bound on the multilinear function F. The second in¬ 
equality depends on the unitary invariance of the spectral norm, the identity ||P^|| = l|P||^®^, and the polar de¬ 
composition Yi - UiPi- The last bound is the inequality between the geometric and arithmetic mean. Since 
EII Yi II “ < oo, and p< a, we conclude that G is bounded. Since F is continuous, an application of the Dominated 
Convergence Theorem shows that G is a continuous function as well. 

The proof that G has analytic sections is similar. Fix a vector c e Aj;(a). Since F is multilinear, it is easy to check 
that the map 

F[SiP'/^, ..., SiP‘/‘''^^, ..., SjPV ..., SfcP[*’) is analytic on - Re C; < z < Re Cj 

for any fixed choice of S( and P( and any pair {i,j) of distinct indices. Together, the Morera Theorem and the 
Fubini-ToneUi Theorem allow us to conclude that 


<Const-E[ixtill^il 


z^EF[SiPl\ ■■■’ SiP‘:‘^\ ..., SjP'J ^ ...,SfcPj*’) 
also is analytic. Therefore, the analytic section property (A.l) is in force. Claim A.4 is established. 
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